Novel peptide identification from tandem mass spectra using ESTs and sequence database compression
نویسنده
چکیده
Peptide identification by tandem mass spectrometry is the dominant proteomics workflow for protein characterization in complex samples. Traditional search engines, which match peptide sequences with tandem mass spectra to identify the samples' proteins, use protein sequence databases to suggest peptide candidates for consideration. Although the acquisition of tandem mass spectra is not biased toward well-understood protein isoforms, this computational strategy is failing to identify peptides from alternative splicing and coding SNP protein isoforms despite the acquisition of good-quality tandem mass spectra. We propose, instead, that expressed sequence tags (ESTs) be searched. Ordinarily, such a strategy would be computationally infeasible due to the size of EST sequence databases; however, we show that a sophisticated sequence database compression strategy, applied to human ESTs, reduces the sequence database size approximately 35-fold. Once compressed, our EST sequence database is comparable in size to other commonly used protein sequence databases, making routine EST searching feasible. We demonstrate that our EST sequence database enables the discovery of novel peptides in a variety of public data sets.
منابع مشابه
Sequence Database Compression for Peptide Identification from Tandem Mass Spectra
The identification of peptides from tandem mass spectra is an important part of many high-throughput proteomics pipelines. In the high-throughput setting, the spectra are typically identified using software that matches tandem mass spectra with putative peptides from amino-acid sequence databases. The effectiveness of these search engines depends heavily on the completeness of the amino-acid se...
متن کاملError-tolerant EST database searches by tandem mass spectrometry and multiTag software.
The MultiTag method (Sunyaev et al., Anal. Chem. 2003 15, 1307-1315) employs multiple error-tolerant searches with peptide sequence tags (Mann and Wilm, Anal. Chem. 1994, 66, 4390-4399) for the identification of proteins from organisms with unsequenced genomes. Here we demonstrate that the error-tolerant capabilities of MultiTag increased the number of peptide alignments and improved the confid...
متن کاملProbID: a probabilistic algorithm to identify peptides through sequence database searching using tandem mass spectral data.
With the recent quick expansion of DNA and protein sequence databases, intensive efforts are underway to interpret the linear genetic information of DNA in terms of function, structure, and control of biological processes. The systematic identification and quantification of expressed proteins has proven particularly powerful in this regard. Large-scale protein identification is usually achieved...
متن کاملChemical rule-based filtering of MS/MS spectra
MOTIVATION Identification of proteins by mass spectrometry-based proteomics requires automated interpretation of peptide tandem mass spectrometry spectra. The effectiveness of peptide identification can be greatly improved by filtering out extraneous noise peaks before the subsequent database searching steps. RESULTS Here we present a novel chemical rule-based filtering algorithm, termed CRF,...
متن کاملMultiTag: multiple error-tolerant sequence tag search for the sequence-similarity identification of proteins by mass spectrometry.
The characterization of proteomes by mass spectrometry is largely limited to organisms with sequenced genomes. To identify proteins from organisms with unsequenced genomes, database sequences from related species must be employed for sequence-similarity protein identifications. Peptide sequence tags (Mann, 1994) have been used successfully for the identification of proteins in sequence database...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره 3 شماره
صفحات -
تاریخ انتشار 2007